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' PAZATN: A Linguistic Approach to Automatic 
Analysis of Elementary Prof ramming Protocols 



Hark L. Miller and Ira P. Goldstein 



PATN is a design for a machine problem solver which uses an 
* augmented transition network (ATN) to represent planning knowledge. In. 
order to explore PATH'S potential as a theory of human problem solving, a 
linguistic approach to protocol Analysis is presented. An tafcerpretattoii. 
of * protocol is taken to be a. parse tree supplemented by Semantic and 
pragmatic' annotation attached to various nodes. This jlaradigm has 
implications Yor constructing a cognitive model of the.>rfdividual and 

designing computerized tutors. ."'»•. 4.1— 

Manual. protocol analysis is tedious and informal;, hence tne 
" design for PAZATN, an automatic .protocol analyzer, is presented. PAZATN. 
Sjuses PATN as a generator for possible interpretations of the protocol, 

|#ith bottom-up evidence biasing PATN toward plans which are likely to 

Aiatch the data. m ' \ tmkm 

H2AZATN is a domain independent framework for constructing 
specialized protocol analyzers. To apply PAZATN to a particular task 
domain, eireat specialists (ESP's). are needed which embody syntactically 
organized domain knowledge. ESP's for the Logo graphics programming 
domain are defined and PAZATN 's operation is hand-simulated on an 
elementary protocol .for this domain. 
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-* Book 1: A Limuistic Approach to Protocol Analysis t 

.* 1. Introduction 

. 1.1. SPADE:* A. Linguistic Theory of Design ^ 

1.2. PATN; Analysis by Synthesis * 

1.3. Theoretical Interpretations - 

^ADE: A Linguistic Theory of jDeslgn 

In recent research we have, developed a theory of design called (SPADE 
which provides a aodel of the planing 'and debugging processes. 1 -' Ve contend that , 
in^addftion to being a powerful theory of machine problem solvtng, SPADE is also 
a useful fraaework for describing huaan iw^Blea solving. To support this 
contention, we apply the SPADE theory tb thf task of analyzing problea Solving 
protocols. 

By adopting this methodology we follow the president established In 

-stalnal protocol analysis studie^ conducted at Carnegie Hellon University 

> • - # * 

[Newell 1966; Newell & jBiaon 1972; Vateraan ft liewell 1972, ^ 1973; Bhaskar'* 

Simon L976]. Our work extends thejr approach along three dimensions. 

1. With the exception of the repent Bhaskar ft Slaon effort, ^hejbttJ 
studies have been restricted to very Halted doaains such/ as 
cryptarithaetlc- Rather than Uaitlng the task domain, we Halt 

Hhe range of responses. Typically protocols are transcriptions 
. of think-aloud verbalizations; we fdcus on the apre restricted 
interactions arising froa -a problem solving session at a 
coaputer console. 2 The analysis task in this setting is to 
interpret user actions —editing, executing, tracing, etc. — 
in terms of the SPADE theory of planning ,arfd debugging. 

2. The CfflJ theory centers on the production system model. Al Chough 
productions are Turing universal, they tend (o result in a less 
structured pr f otffaa organization than the linguistic formalisms ^ 
of the SPADE theory. The PATN program, the procedural 
embodlaent qf the" SPADE theory, uses an augmented transition 
network [Woods 1970] to represent planning knowledge; 
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3. CHU analyses are based on the problem behnivr graph. Pursuing 
an* analogy to computational linguistics, we define an 
interpretation of a protocol to be & pane tree supplemented by 
semantic and pragmatic annotation. The parse tree characterizes 
the constituent structure of the protocol. Semantic and 
pragmatic annotation variables and assertions attached" to 
nodes of the parse tretf formalize the problem description and 
the rationale for particular planning choices! Annotated parse, 
'trees closely reflect the local structure of PATNVs linguUtic 
problem solving machinery, leading more directly to- inferences 
regarding Individual differences than is evident from problem 
/behavior graphs^ * , 



Ruven Brooks [1975] applied the CHU approach to the programming domain. 



developing a moder%f coding -k the translation of high level plans into the 
statements of a particular programming language and testing the model by 
analyzings-protocols. His model is a set of production rulas whose conditions 

7 ( 

match the patterns)^! plan elements and whose actions generate code statements. 
Protocols are analyzed manually, with the experimenter attempting to Infer the 
plan which is' then expanded by the production system into code paralleling that 
of the protocol. The processes of understanding the problem, generating the 
plan, and debugging are hot formalized. SPADE goes beyond this in that' it can be 
used to parse protocols and that the parse constitutes a formal hypothesis 
regarding ryot only the coding knowledge but also the 'planning and debugging 
strategies employed i>y the p rob lei solver. ^ j 

The paper is divided into two books. Book I develops SPADE'S linguistic- 
paradigm for protocol analysis. A prototypical filamentary programming protocol 
jls parsed, and the implications of this information processing analysis for 
Constructing cogn^ttive models and designing computerized tutors are discussed. 

Book I does not address the question^Phhow a protocol parse Is derived. 
In earlier work, problem solving protocols were' analyzed manually. 3 However, 
man up 1 analysis is tedious and informal; hence Book II presents the design for 
r PAZATN, an automatic protocol analyzer. PAZATN uses PAT^ - the procedural 



r 



Protocol Analysis 



1.5, Hiller 8t Goldstein 



embodiment of the SPADE theory — as • generator for possible interpretations of 
•the protocol, with bottom-up evidence biasiog PATH toward plans which are likely 
to Batch the data (figure 1:1). 

■ PAZATN is a domain independent framework for constructing specialized 
protocol analyzers. To apply PAZATN to a particular task^domain . event 
specialist* (ESP's> are supplied' which embody domain-specific knowledge. For 

conereteness, we employ examples from the Logo elementary graphics programming 

* '" * ' . 

domain; ESP's for. this domain are discussed. PAZATN's operation is haBd- 

i 

^simulated on an elementary protocol from this domain. 

i _ 1 

• I 

1.2. PATH; Analysis by Synthesis 

A major insight of the generative grammarians (e.g. , Chomsky. [ 1965 ]r was 
that it is often helpful to characterize phenomena synthetically: one devises 
rules to generate the phenomena. Analysis. can then be viewed as a recognition 
process for selecting derivations from tlfe space of synthitic possibilities. We 
adopt this viewpoint in analyzing protocols, with PATN as our generative 
foraalisa. . * ^ 

The SPADE theory, which, PATN m|bodies., begins with a taxonomy of commonly 

. observed planning techniques (figure 1:2). When a problem is confronted — 
according to the theory — one of three types of plans may be pursued. XV The 
problem- may be solved by idtRti/icatioji: recognizing it as already having a- 
solution t . This planning category, seemingly trivial, is of course essential to 

• avoid infinite regress. (2f The problem aay be solved^r decomposttto* : 
dividing it into smaller, easier subproblems. These are each solved separately, 
and then racombined, thereby disposing of the original problem. (3) Should th* 
first two strategies fa£, the problem may be*"solved by re/ormultftto* : 
radascribing it in terms that seem more amenable to solution. The reformulated 
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* * 

problem must, in turn, be solved by identif icatipn, decomposition,* or fj|rth»r 

t * * 

reformulation. As the figure indicates, each of these categories off plans is 
further subdivided. , ^ 

PATH (figure 1:3) is a problem solving program based on this taxonomy. 
PATN was derived by first representing the taxonomy as a recursive transition 
network. 9 This produces a non-deterministic problem solver. Supplying precedence 
ordering for arcs from each node, predicates which test preconditions for 
transitions, and actions to be performed when arc transitions occur, produces an 
augmented transition network [Woods 1970] that is far more deterministic in 
solving problems (although backup is permitted). Thp predicates access registers 
which store semantic information about the problem; the actions modify these 



registers. v 

i \ ' ... - 

PATN 's solutions can exhibit rattonal bugs errors arising fro« 

heuristically justifiable but* incorrect planning decisions *ich as the trial 

execution of an incomplete plan, which omits necessary interface steps. Hence a 

complementary theory of debugging is developed using the same approach as in the 

planning theory. Figure 1:4 shbws a taxopomy* of .debugging techniques. This 

taxonomy bifurcates into techniques foydUg nosing the underlying cause of a bug 

and techniques for repair! ng* the bug once isolated. Model diagnosis ls^ typical 

of the diagnostic techniques. It consists of executing the program in orde^ to 

construct a list of violated ^odel predicates, which is then examined to check- If 

any code was written to accomplish the violated predicates. As in the planning 

theory, the debugging taxonomy is transformed into an ATN (figure 1:5) by 

providing registers, arc predicates and so on. The debugging ATN is called DAPR 

(debugger of annotated programs), and is an integral part of the PATN system. 

Consider the operation of the planning ATN on an^ example frdm the logo 

graphics environment, where students define, test, and debug procedures to draw 
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simple pictures. The WISHINGWEtL (figure 1:6). is a typical ^egimter's. pro J0ct. 
The students' task description is a sketch of .the desired picjure. PATN, , 
however, requires a formal task" description: ' figure 1:7 illustrates this* 
fT^he model, for WISHINGWELL. Hodels are expressed in en assertional 
formalism developed- by Goldstein [1974, l975],/which is similar to the first ' 
order predicate calculus. The aodel characterizes the range of pictures which 
match the sketch.*, f . 

PATN's solution to the WISHINGWELL task has three aspects: (t) an ✓ 
hierarchical pltfn derivation, sumaSrizing^he arc transitions which *«re 
followed; (2) a. snapshot of the values of the ATN's registers attached to each 
node of the derivation, representing the semantic context at the tine the node^y 
'was created; and (9) a sot of instantiated" arc predicates at each node 
describing why the chosen arc transition was preferred to its competitors; th0M ^ 
are calle'd the pragmatic assertions of. the node. 7 The semantic variables and 
pragmatic assertions relate the subgoal structure of the problem solving protocol 
to the model describing the task to bo accomplished. 
* Figure 1:8 shows PATN's hierarchically annotated solution. Naturally, 

this is not the only solution to the WISHINGWELL problem: to apply PATN to 

» 

protocol analysis, we allow PAZATN to. reject solutions which do not match the 
protocol datai forcing PATN to backup so that alternative solutions are 

w 1 * 

generated. 

1.3. Theoretical Interpretations ^ 

We define an interpretation of a protocol to be a PATN plan derivation: 
a^iarsi tree whose fringe is the lis£ of events (e.g.. figure 1:8), augmented by 
annotation associated withVeacJ^ode-Vf the, parse. Since different plans 
sometimes lead to the same coding events, some protocols have more than a single 
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' EIGURE 1:6 
WlSrilNGWELL PICTURE ■ ■ 

AN ELEMENTAi^ LOGO GPAPHICS PROJECT 
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( . < , : 

Oftw e VISKlkGMU with t sfuere well end ,a 'Lridagular roof. 

connected a jole ,»Aicl. is feline. r*e roo/ jAomI* be a*or* 

tne pole, one* tfce pole, above the well. The well should be 

connected to tfte sole at tfte midpoint of tie upper side of the 

veil end the lower endpoitt'of .tne sole/ The. pole s.hould' be 

. \ ■ . /'. 

connected to tne roof .et the midpoint of the bottom sl4e- of the.' 

r-oo/ end the upper, endpoint of the pole. The bottom side of the 

ia^ef^ide of the veil inontrf *e noMzontol . * 



( def ine-hodel. wishingwell ( ) " 
(exists (roof pile well) 

(and (triangle roof) 
(line pole) 
(square well) 
(above roof* pole) 
. (arqke pole well.) 

j % (EXISTS (P) 
/ (AND (CONNECTED WELL POLE (AT P)) . 

(EQ P (MIDDLE (UPPER (SIDE WELL^))) 
I (EQ P (BOTTOM (ENDPOINT POLE))))) 

<• (felSTS (Q) 
A (AND (CONNECTED POLE ROOF (AT.Q)) 

. ' (EQ Q (RIDDLE (BOTTOH (SIDE ROOF)))) 

.(EQ Q (UPPER (ENDPOINT POLE))))) 
(HORIZONTAL (BOTTOH (SIDE ROOF))) 
(HORIZONTAL (UPPER (SIDE WELL)))))) 



Figure 1:7. Predicate, Model for a WtshwiwoU 
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interpretation. *The basic oliii of this paper is that PATN can efficiently 

generate the psychologically plausible interpretations. 

Evidence for fchis claia Vests^on four sources of evidence. * 

1. The heuristic 'adequacy of PATN as a problea solving prograa • 
-i provides suggestive, though by no Mans decisive, evidence. At 

*' least for the restricted world of elementary Logo graphics, 

hand-siaulations indicate that PATN is hejiristically adequate, / 

' • ' t ' *L I 

" l. introspection by human problem solvers is a weak but useful 
souVce of evidence. 'To some extent PATN was designed on the 
bans of introspection and hence has some support along this 
dimension. ■ ' x « 

3. ' A strong source of evidence is the appropriateness of the replies 

of /t\ question-answering module that nerforms retrievals and 
kiipir-anf erences over a datrfasfc**£omposed of' these 
interpretations. The question-answering aodule is introduced in 
chapter two. The replies to the example questions given in that 
chapter sees appropriate to the authors. ^ 

4. The 'strongest source of ayjjdepce Is ability, to predict 
performance \ in future situations on the basis of past behavior. 
Chapter three describes modifications to the ATN that provide 
predictive models of typical problem solving behaviors. 

We find this informal evidence s^TficieOtly encouraging (as detailed in 

the remainder of Book I) to warrant the* design (in Book II) of a precise 

♦ / • ' 

framework • for .generating SPADE-style protocol Interpretations. Futilre research 

wflT. rigorously evaluate the psychological validity t of these interpretations as 
follows. * ' ' > * 

1. «PATN will be implemented and tested on a broad range of examples. 
This will confirm its heuristic adequacy. 

2. An editor based on SPADE will be constructed as a structured, 
programming environment, and transcripts ef the problem solving 
behavior of programmers using this editor analyzed. m Coupled , 
with systematic interviews, *this will provide evidence regarding 

^ the sufficiency of SPADE 1 s Tepertoire of planning and debugging 

concepts. , 
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3. PAZATN with * question-answering Interface will be implemented. 
The appropriateness of the replies generated by the question- 
answering module will be Judged, by , skilled but unbiased 
informants, and. by systematic subject interviews. 



4. A modeling component will be implemented that modifies the PATH 
ATN to be. more in accord with a particular student's behavior, 
tests will be conducted to determine, whether the modified ATN is 
more successful Hn predicting performance' on subsequent 

protocols. . 

m 

Before proceeding, a possible aisconception involving the, distinction 
between representational frameworks and psychological theories should be 
dispallad. Two hypotheses ara defended by tha rasaarch p^ograa outlined In this 
paper: flj'that ATN 1 s ara N a useful represeatotio* for the models we ara 
developing; and (2) that particular ATN's, tha output of our modeling procedure, 
constitute taeortes of individuals -.- stated in the language of ATN's — which 

9 

make statements about tha presence or absence of certain problem solving skills. 

/ ■ - ' 

Both hypotheses are of course subject to experimental verification. We do not 

> 

argue that other Turing-universal formalisms (such as productions or Heidorn's 
C 19753 augmented phrase structure grammars) cannot also •represent these theories. 
Stronger claims regarding the nliditi of ATN's per se as . psychological 

* y v fc 

mechanisms require eddltional assumptions regarding processing costs and 
limitations Which we are not currently prepared to defend. 
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2, An Example of Protocol Analysis as Parting 

2a. An Example Problem Solving Protocol 

- 2.2. Structural* Discretion 

2.3. Semantics 

2.4. PragMtlcs 



An analogy tcf computational linguistics has been fruitful both in 
defining the objectives of analysis and in designing the PAZATN system for 
automating the analysis process. The analogy suggests partitioning analyses into 
syntactic, semantic, and pragmatic components. These components correspond to 

• o J 

the^ potential control 'paths 9 data flow, and branching conditions of a procedural 
problem solver. From a problem solving standpoint, these are modeled by thm 
network of states and arci, the registers:, and the transition conditions of thm 
augmented transition network. From a protocol analysis standpoint, syntax 1% 
represented as a parse tree; semantics and pragmatics are represented as 
annotation (variables and assertions) associated with each node of the parse 
tree. 

- This chapter presents a SPADE interpretation for a typical WISHINGWELL 
protocol. Book II provides a hand-simulati'on of PAZATN generating this 
Interpretation. 4 * * 

2.1. An Exaaple Problem Solving Protocol 

Since analysis consists of the selection of a PATH plan derivation, 
analyzing a protocol identical to PATTs defrult solution (figure 1:8) is 
trivial. ^Henct, a different protocol, Involving a variety o? plans including 
reformulation and repetition, serves as our example. 1 



7" 

4 



9 

ERIC 



Protocol Analysis \ \™ Hlller « Goldstein 

the student begins bp writing an iterative procedure to dram" ^ae 
*1*ere WELL. 7 • 

^ . *01 / TTO I 

E02/ ■ >11L*EPEAT 4 f*0 30] 
tOi^^m FORWARD 100 

E04 >30 RIGHT 90 • ✓ 
* E(ft >ENO 



* auperprocedar> /or tie VlfillKHELL is defined »y « aeeuestiol 

lioi dro»i*g /irat F/lfff.. c prefioAljf de/i*ed procedure, esd 

t*e« WfU. j w 

E06 ?TO VW • 

E07 >10 TREE 

EOS >20 WELL 

E09 •* >END 



( 



The WW program is executed, producing figure li9. 
E10 7WW 

The program is edited to include an interface establishing the 
proper relation between TREE and WELL. - 

Ell 7EDIT WW 'M ■ 

E12 >13 RIGHT 90 ▼ 

E13 >15 FORWARD- 50 

•* EM >17 RIGHT 180 

E1S >END / 

leZ. Structural Description 

The result of analysing thi*sH>rotocol Is a data' structure, the 
interpretation, consisting of syntactic, seaantlc, and pragmatic components. The 
syntactic component, diagramed ln*/<»rture 1:10, Is the protocol's structure! * 
description-, a parse tree representing the sequence of PATN arc transitions 
required to generate It.* 

Such -structural description's capture one aspect of problem solving 
behavior. They provide a formal basis for answering questions regarding which 
plan types were used, a topic which could otherwise be discussed only 
intuitively. 10 Their most direct application Is to answering "how questions." 
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TURTLE BEGINS HERE 





TURTLE^ENDS HERE 



FIGURE 1:9 WW AT ElO — INTERFACE NEEDED 
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r-Solve (*TREE*) 



. .Ref . . .Plan. . .CONJ-SEQ- 



Debug 



-Solve (TREE) ActZfiztidy de.{,<Lne.d)A !E01 ?T0 WELL 



-Solve (WELL). 



repetitioi 



-Solve (*WELL) 



cr 



, user-subr- call 
-^^er-subr-call. 
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Diagnosis Model-diagnosis 



E02 >10 REPEAT 4 [2a 30] 

E03 >20 FORWARD 100 . 

E04 >30 RIGHT 90 

!E05 >END! 

!E06 ?TO WWl 

E07 >10 TREE . 
- E08 >20 WELL \ 
!E09 >END! 
* 



Repair Complete Solve ( INT-TW) 

£ » • tt J 




E10 ?WW 



!Ell~?EDIT WW! 
E12 >13 RIGHT 90 
El 3 >15 FORWARD 50 
E14 >17 RIGHT 180 

IE15 >END! 
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FIGURE ItlO ABBREVIATED STRUCTURAL DESCRIPTION FOR STUDENT WW 
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.-^^ . **.Q1. Hem dots prbcedure WELL accomplish ^square? 

Al^H^U^uses * repetition -plan. The generic subgoal is a side. 

QZ. row dpes procedure TREE accomplish a roof and pole? ^ 

* «| A2. The rq$f and pole model parts were regrouped into a tree by a 

/reformulation plan. Prjfcedure/TRfcE wis already in the answer ^ 
* library, allowing an identification j^an. 

* Still, the parse trie is an incomplete description: it does not indicate 
the semantic relationships between subtjoals or the pragmatic criteria governing 
the choice of one plan over .another. V 

- • s . •> ; . . 

4 2.3. Semantics J V" 

Semantic annotation Is defined to be the values of semantic variable*, 
associated with each>node of the parse. These variables relate^the plan to the 
formal problea model by reco?&ng the. .contents of the.AJN's registers at the tlam 
the nolo was generated. The following are typical -PATO registers. 

■ 1 . : PLAN Is the hierarchically, annotated program defined below the 
current node, reflecting Its state after dominated editing 
events have been processed. • 



»» 



2. -.CODE Is the fringe of -.PLAN. 



3. : EFFECT ls*a descrlptjWof the effect of executing the coda 
/ "defined below the current node. Since the code may contain 

* * references to undefined user procedures, :EFFECT may- be 

# unasslghed at a given node. For the elementary .graphics doaaln, 
this* variable Is called -.PICTURE, and describes the picture 

ji \ drawn* by the program In Cartesian coordinates,. 

f '/.*'• ' ■ ' • • 

* 4. :HODEL Is the set of predicate* Which -.CODE U intended «fco 

« „ accomplish. For a correct program iEFPECT Is an Instance of 

. : MODEL, m 9 

5. : ADVICE is a list of planning suggestions generated by PATN arc 
actions* For example, the 'linearization arc, (see PATN's 
conjunction node in figure .1:3). creates aflvlce/egarding both 
the order In which subprocedures should be written, and tire 
order In which, they should be" invoked. 
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X 

6. : CAVEATS is a list cflT warnings "for potential bugs generated by 
PATN when heuristic guidelines are, used in planning. For, 
exaaple, if no interactions are detected when solving a problea , 
involving an unfaailiar doaain predicate, it is possible that 
the predicate actually give rises to interactions, but their 
patterns Itaxe. not yet been Earned by the systea. Hence, 
:CAVEAT3 can bo se^, recording this potential bug, on the arc- 
"^transition froa /conjunction to sequential plans. This 
inforaation guide* OAPR, PATN's debugging aodule, in subsequent 
diagnosis. V 

T. : VIOLATIONS is the list of aodel predicates which are not 
satisfied by the : EFFECT achieved'by :CODE. This register and 
: EFFECT are set by a performance annotation aodule designed by 
• Goldstein [1974]. 

I Let us saaplo the values of the seaantic variables af various nd)^es of 

the period WW protocol. : MODEL for the top level SOLVE node was shown in 

figure J":X For the INT-TW SOLvfc, node, :H0DEL is: 

r*e\tree mast be atx>9« the well, and the bottom endpoint of the 
tree\aust connect to. the midpoint olgkhe upper side of the well. 

In our IflSP-oriented aodel language notation £nis is represented as: 

(AND (ABOVE TREE WELL) 

(EXISTS tP) 

(AND. (CONNECTED TREE WELL (AT P)) 

. (EQ P (HI DOLE (UPPER (SIDE WELL)))) 
(EQ P (BOTTOH (ENDPOINT TREE)))))). . 

This subadoel reflects the reformulation of WISH INC WELL into a TREE and a WELL. 

Typically, seaantic annotation is relevant to answering "what 

questions.* The above yelue of : MODEL for the INT-TW node provides eii exaaple. 

03. What isthe purpose of lines 13, IS and 17 of WW? 

•A3. Those three lines are in-line code interfacing subprocedures TREE 
and WELL. The interface establishes connectivity at the 
appropriate point, and causes the tree to appear above the well. 

VIOLATIONS at the PLAN node forHiW provides another 7 exaaple. 
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04. What is wrong with procedure WW when It is first executed (at 
event E10)T 

A4. The necessary relations between the aodel parts TREE and WELL 
* have not been established: specifically, there is no point P 
such that the tree is connected to' the well at P. P is the 
■lddle upper side of. the well, and P is the lower endpoint of 
the tree. 

j r 

The '.VIOLATIONS variable which Mediates this answer is non-enpty at the PLAN node 

« 

because the debugging which generates the missing interface has not yet occurred: 

the English answer simply paraphrases its LISP value: • • 

•ft- (NOT (EXISTS (P) * r 

(AND (CONNECTED WELL TREE (AT P)) 
' (EQ P (MIDDLE (UPPER (SIDE WELL)))) 

(EQ P (BOnON^ENDPOINpCREE) ))))). ¥ 

2.4. Pragaatics ^ 

^ppqfiMtic Annotation is defined to be a record of the justifications for 

selecting a given arc transition over its competitors, and constitutes en 

hypothesis about the reasons for using a particular plan. REAMfe are assertions 

attafched to each node of the pars*. The REASON for using a particular plan in a f 

particular situation is an instance of the arc predicate leading to tha ATM state 

for that plan, where the current values of the registers are taken into 

account." For example, the reason that WELL was decomposed using e repetition 

plan in the protocol is that : MODEL at tflat node was generic. 

* (REASON (REPETITION E02) 

(GENERIC ( -.MODEL E02))). 

Pragmatic annotation is germane to answering "why questions. 0 * 

Q5. Why did the student execute WW at event E10 — did (s)he believe 
the program to be correct? 

AS. Probably the student expected bugs. A reasonable strategy is to 
initially plan only for the aaln steps, with the Interfaces 
ioImsA- later by debugging. WW was executed at E10 In order to 
/discover what Interfacing, If any, was needed. 
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This Illustrates tot analysis of a procedure containing the rational bug 

of constructing an Incomplete p^. ,a Debugging operations are analyzed by 

postulating the spoliation of som DAPR technique. The reasons for debugging 

operatlono^yplcally Involve localizing or repairing the cause of soae model 

violation. The purpose of running the program at E10 was to perform model 

diagnosis; this technique was chosen because the occurrence of two consecutive 

Mlnsteps {with no explicit Interface) Implies that the plan say be Incomplete: 

(REASON (MODEL-DIAGNOSIS €10) % 

(AND (OPTIONAL < INTERFACE .TREE WELL)) 
(HISSING (INTERFACE TREE WELL) 
( :PLAN E06)))). 

In this case, model diagnosis demonstrates the existence of violated predicates 
for which no code .exists: the plan Is In fact Incomplete. This Is the reason 
for the subsequent editing: t«P*ir of the Incomplete plan by resuming planning 
at the offending locale. 

The reeso* for tfte comptetto* pies iVtae editiag episode (U2 
. through E14) U U eltmtsete tse stolkttoss »p supplying tae 
miss tap iater/ece ftetweea TUl 9*4 VtU. 

(REASON (COMPLETE (E12 E13 EU)). 
> <(AND 
(HEHBER 
V<NOT 

[EXISTS ... . 

^(AND^ ( CONNECTED WELL POLE (AT P)) 

* - ))) 
({VIOLATIONS E10)) « 
(EQUAL 

'(EXISTMP) 

^ (AND (CONNECTED WELL POLE (AT P)) 

' " )) 

(:HODEL (INTERFACE TREE WELL))) . / - 
(HISSING (INTERFACE TREE WELLJ ( :PLAN E10)))). \ 

* 

* * 

The conjunction of predicates collectively called SEQ (on the arc from 
conjunction to sequential) plays a role In the following example. 
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<N. Why was the invocation order {TREE* WELL) used, rather than the 

reverse?* ** 7 ' 

A6. TREE ends' at its bottom, a required connection point, resulting 
In simpler Interfacing for that ordering. If the TREE toff* at 
that connection point, the reverse order would have been 
preferable. 

Here one of the SEQ predicates incorporates knowledge about the doaaln predicate 
CONNECTED, that interfacing can be simplified if two siibprocedures are invoked in 
an order such that the endpolnt of the first corresponds to a Mutual connection 
pointf An instance of this rule becones a pragmatic assertion of the SEQ node In 
the parse. 

r«e reason /or preferring the {THEE MILL) sevue*ci*a is tnet 
THEE 9 His at a reeutred connection point 0/ MELL. . 

(REASON (SEQ (E07 E08)> 

(AND , 
(EQ (POSITION : TURTLE (AFTER TREE)) 

(BOfTOH (ENDPOINT TREE))T 
(EQ.(POSITION : TURTLE (AFTER TREE)) 
(MIDDLE (UPPER (SIDE WELL)))))) 

A precise' definition *e£ a linguistic approach to protocol analysis has 
been provided and a. concrete analysis of this kind supplied. We now turn our 
attention to the potential utility* of the approach- for constructing cognitive 
MMtols of individuals. 
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, *» . 

3. Toward a Cognitive Model of the Individual \ , 

3.1. Tailoring too ATM to. the Individual 
' 3.2. Individual Differences and Ovarlay Modeling 
3.3. Issues and Examples, and the Computer as Coach 

v , 

i 

3.1. Tailoring the ATM to tha Individual " 

Advocates of computer-elded instruction point out that coaputars can be 
uSad to Ullor instruction to tha needs of tha iadlvldaal. Yat llttla is ^known 
about what it means to construct cognitive models of individual students, or 
about how to. use thasf in providing sensitive and effective automatic tutoring. 
Thd SPAM' theory suggests an approach. \ 

. SPADE confronts the problem of individual differences by considering the 4 
possible "ays in which the student's ATM caorllffer froa that of an sfxpert. One 
error would be to have a variant of the optimal pragmatic arc constraints. Horn 
serious would Jme to have alsslng or extra arcs. Even more serious would bo to 
have missing, or extra states. Differences which, can be fonaallsed as alterations 
V*ojie topology of the ATM are Manifested in the production of a different set of 
parse trees: • PATN sight be capable of sons derivations not available to tha 
student, -or vice versa. Differences in arc conditions or arc actions are 
Tfested by the selection of other than the {optimal plan for a particular 



problem situation; although the soma repertoire of plans nay be available. 

These types of Modifications, properly combined, can account for many 
commonly observed weaknesses in atudont problea solving., To demonstrate this 
point, we present six examples of student weaknesses and the fashion in which our 



■ode ybg scheme is .'able to capture, them. The examples 'are derived frost informal 

T ! . 
data collected in our prior Logo tutoring experience's. 
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1. BASIC Syndrome: A studtnt with prior programing experience In/ 
.the BASIC language never usei recursion. Problems for which 
Iteration Is awkward are solved only with difficulty; problems; 
for which Iteration Is inadequate ,\ such as drawing arbitrarily 
deep binary trees, are unsolvable. 

* * s 

/ » / 

» I" 

I 

A deviant version of the PATH subgraph for repetition planning Is 
Illustrated In figure 1:11. The correct subgraph! has an Intermediate ROUND plan 
state; the deviant version, pissing this state and Its. associated arcs, 
characterizes the BASIC syndrome. The student* s repetition arc bypasses the 
ROUND state, short circuiting the ATN to pursue the Iteration option with* no 

V \ 

possibility of recursion. In general, failure to enplo^ a>full repertoire of 
planning options can be modeled In this fashion: the short circuit is postulated 
to occur at the node immediately prior to the least, common superset of the class 
of unused plans'. 

% - • 

» * r 

* Z. Discontinuity: A student fails to build upon previous work, . 

never taking advantage of relevant existing procedures. Each 
new picture to be drapn is' treated as an isolated problem, and 

recurring subprobHm* are repeatedly solved afresl. 

* * 

PATH can accomplish identifications using either primitive* or previously 
solved problems. Discontinuity amounts to examining the primitive library only. 
This is modeled by the absence of the corresponding predicate oiy^he arc from 
^ * PLAN to IDENTIFY. A similar but mpre subtle case would be a student that 
occasionally uses previous solutions, (Hit n * as often a* PATN predicts. This 
Indicates that the identification network Is probably Intact,^ but parts of the 
reformulation subgraph are missing. ?uch a student fails 'to notice the' relevance 
of previous problems because they are described In slightly different terms. 

introspection suggests that this Is a coemon source of difficulty. 

• 1 ( 
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FIGURE 1:11- DEVIANT ATN SUBGRAPH FOR REPETITION PLANS 
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3. 0iaf*o£(4 Aroictajice: A student performs well in planning and 
defining programs. However, when a bug occurs, the student 
; falters. Rather than ftystematically localizing the underlying 
, * cause of the erfor, followed by repair, the, student • immediately 
begins to edit the program. The changes are haphazard and 
counterproductive, creating more bugs than they eliminate. 

t Relative to the DAPR debugging ATN, diagnosis avoidance i$ a weakness 
wherein the student has an extra arc not present in the expert. Whereas DAPR 
cannot proceed to the REPAIR state without first passing through DIAGNOSIS, the 
student is modeled as having an undesirable extra arc . bypassing this state 
(figure 1:12). This allows diagnosis to be ( incorrectly) treated as optional* 

..4. Syntactically Unstructured Code: A student, never uses 
subprocedures, Instead relying entirely on in-line code. This 
results in long, unreadable progress which are difficult to 
debug. Often the student forgets which subgoals 1iave been 
solved, or forgets how previously solved code segments wopi. 
Few project^ are successfully completed*. / 

PATN's use of subprocedures is governed by register setting action* 
associated with the sequential refinement loop. This is the culpable locale for 

' ^ 0 

a type of non-nodular design we call syntactical If unstructured code. Instead of 

first setting the 4 PLAN register to- a sequence of subprocedure calls, and then 

pushing for a solution to each in turn, the student apparently performs, these 

actions in the reverse order: first raahing for a solution to each subprocedure, 

■ 

and then setting t^jPLAN register tofthe concatenation* of the popped results. 

Note that this deviant ordering of arJ actions requires far more intermediate 

storage to keep track of recursive tans to the ATN: given a limited pushdown 

c - *w 
stack, itf is^hot surprising that the studeift forgets things. 

5% Semnticallt Unstructured Code: A student- mechanically begins 
every Log* procedure' with the PENUP command. UsuaJ^this works 
y - out well, In preparation for a position setup. However, even 

jt when the position setup Is unnecessary, the PENUP is still used, 

resulting in either:., (a) a rational form wiol§tion. in which 

. . 35 . . 
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(-EXISTS (MODEL- STATEMENT STEP LOG) 

' ' (AND' (MEMBER (NEGATE MODELS STATEMENT) ( JWIOLATIONS LOC) ) 
$ EQUAL_ MODELr STATEKENT (: MODEL STEP^J 
.(MISSING-STJP NhPLAN LOC) ) ) ) 



(EXISTS (STEP LOC) 

(AND (OPTIONAL. STEP) 

(MISSING STEP (:PLAN LOC)))) 




: VIOLATIONS^ INTERPRET :CODE : MODEL) \ 
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FIGURE 1:12 DEVIANT ATK SUBGRAPH BYPASSING DIAGNOSIS 
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the PENUP is followed immediately by a PENDOWN; or (b) one or 
. more missing model pert violations, dye to the Invisibility of 
/ vectors Intended to accomplish main steps. (Solomon [1976] uses 
the tens cite** to describe this class of phenomena.) 

Treating some optional constituent as If i% Is required ^results In e 
second kind of non-modularity, semantical ly unstructured code^ The particular 
cite** Just described, mechanically including PENUP commands, is mediated by the 
linear decomposition arc. If this arc is modified to create position setup 
subgoals without testing whether : MODEL actually requires such a setup, the 
effect is to Include a PENUP at the start of each' turtle program. 

6. Pragmatically Unstructured Cocfe: A student who formally does 
break large progress into subprocedures nevertheless encounters 
numerous bugs, many of which are difficult to localize. The , 
subprocedures lack modularity, each being dependent on knowledge 
of the inner workings of others. For example, interfaces are 
included as part uf sain steps, so that the initial state of a 
given procedure is determined by the fin*l state of whichever 
procedure happens to precede it in the planned order of 
Invocation. 

* 

While failure of a particular arc action to consider the problem at hand 
►results in sonantlcally unstructured code, faulty arc predicates in deciding 
among alternative arcs leads to a third form of non-modularity, prepmettcelljr 
unstructured code. The unnecessary construction of^ non-linear subprocedures Is 
attributable to either improper defaulV ordering or malfunctioning predicates on 
the arcs leaving the conjunction node. For example, the INTERACTIONS predicate 
may not be imposing sufficiently strong conditions on accepting the model: thin 
leads to the addition of constraints on the subprocedures when no real non- 

j 

? * 

. linearity is present'. 

Thus perturbations on PATN provide a deep theory of student weaknesses, 
explaining unsuccessful behavior in terms of the syntactic, semantic, and 
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pr.gB.tie structure of the ATN. 



3.2. Individual Dlffer.no. and Overlay jtodollnj 

we •nvl.lon inducing . Bodel of lndlvldu.1 Idlosyncrn.le. .* 
perturbation, of PATH by applying Goldstein's [1976] o»erle» iwdeit.g technique. 
Thl. .ppro.ch describes Individuals with, respect to an export problem .olvlng 
program by associating probabilities with each decision point In the expert, 
rqpresenting our «t.te of knowledge about . given Individual's preferences. The 
probabilities are a sueaary of the available evidence rather than an Integral 
part of the model: at any given tloe, a process model Is obtained from the 
overUy-o^obablllties by including those possibilities that are above threshold 
and excluding tho.e that are below. 13 Goldstein and Carr [1977] use this 
technique to Infer process Model, of behavior In a logic and probability game 

called WUHPUS. " 

This raises the question of whether all of the perturbations mentioned 
above, including the .Iterations In ATN topology, can In fact be represented by 
such an overlay, I.e., by a numerical plausibility te.le: It turns out that they 
can. A missing .re can be handled by assigning It an a priori transition 
plausibility of 0. A missing Intermediate *t.te can likewise be represented ^ 
the plausibility of the arcs leading to the unused states being 0, but 
plausibility of the arc to the "short circuited" state being 1. Similarly, 
default orderlngs can be reversed by reversing their relative plausibilities. 

This table driven organization allows distinguishing between personal and 
•rc.etypci ATN's. Archetypal ATN's are analogous to Winston's' [1970] concept 
models, and in fact . our scheme for Inducing person.ilxed ATN'. boar. some 
reseablance to Vlnstoh's, learning system, except th.t our networks happen to -have 
procedural rather than structural meanings. Personallirti ATN'. .re created from 
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the archetype by thresholding over a particular plausibility table. This 
simplifies tho tuning and debugging of the systea and eliminates the-danger that 
states or arcs added to aodel non-expert behaviors Bight* degrade th*, performance 
of the expert. The expert ATN Is obtained by coupling the archetype with an 
export plausibility table. Tho entries for undesirable options are zeroes. 

A first approximation to the plausibility table for an individual can be 
derived from the relative frequencies of arc transitions in previously analzyed 
protocols. HoweVer, this Ignores the seaantic and pragmatic context. It could 
be that infrequently used transitions were 4 inappropriate for the tasks porforaed. 
Consequently, this is refined by coaparlng th4 tallies to a record of the 
expert's performance over the same set of tasks. * (This technique, differential 
modeling, is suggested, by Burton & Brjpwn [ 197(j]. ) Naturally there will be 
differences in individual protocols because of arbitrary choices, but in., tho long 
term consistent properties of the student's behavior should emerge. 

Just recording arc trans iKons*Ss still too crude. One should account 
for differences in terms of the smallest chunks of nalfunctionlng knowledge which 
can bo isolated. As a second order cognitive aodel, the units of analysis are 
taken to be the individual arc predicates an d ac tions. The statistical evidence 
can be Used to differentiate which arc operations are aalf unction ing or aissing. 

3.3. Issues and Examples, and the Coaputer as Coach 

TWo crucial ingredients are lacking in current uses of coaputers in 
education: a cognitive theory describing the problea solving and learning 
processes, and a pedagogical theory prescribing techniques to facilltata and 
enhance these processes. As a result, aany instructional application! of 
computers are ad Aoc, if notdetriaintal. 

Thtrt «r« •xceptiwu to thtst criticism*. Th« Logo projtct [Papert 1971] 
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0 % • 

offtrs educational applications of computer technology suggested by a 

computational approach to problem solving and learning. However, the 

Jus£l£*tftolon for many tbgo Insights remains Informal and intuitive. The current 

werk is an effort* to increase the theoretical precision and experimental rigor of 

Logo research. Ot0t except ifns Include the work of John Seely Brown group 

[Brown et ml. '1974,1973; Burton 4 Brown 1976] on intelligent inltoicti^nal 

systems for electronics (SOPHIE) and elementary mathematics (WJST) , and .that of 

Staijsfleld. Carr and Goldstein [Stansfleld et al. 1976f Goldstein & Carr 1977] on 

> 



eln [Stansflelc 
^rtst^litor su 



an advisor, for UUMPUS. The JrtSf^tlitor suggests a paradigm, also used in WUHPUS, 

% 

in which issuer- (abstracted differences between expert and novice behavior) are 

- 

.illustrated by concrete examples j>T their application, to active learning 
situations. 

^ Given thfc cognitive modeling jtools developed in this chapter, an lssues- 
and-examples Logo tutor can be contemplated. ^VheiL PAT^s expectations are 
violated because of a difference between tfje expert and student versions of the 
ATN, tlftn that /ttfJTie can be raised with t(he student. This would extend the 



issues -*id-a*mirH*^*paradigm of WEST and the compute r-as- coach paradigm jof 
WUHPUS, not Only by addressing a more difficulties* domain, but also by 
elaborating the notion of issues, from\rbstractlons of empirically selected 
feature^, to specific prograamatic weaknesses. ^ 

The ^theory would also constrain the order in which Issues should be 
- presented to the, student. The topology of the aYn should be nearly right before 
-pragmatic^trc constraints are discussed. Likewise, the general form of the 
pragmatics should be correct before dofain-ipecific arc critics are taught. 
Although many subtleties arise which are not touched on here, the approach takes 
a stair toward theoretical foundations for computer tutors which provide 
sensitive, flexible, iniiwiiutl instruction in problem solving skills. 
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^ 4. Motes to Book I 

1. SPADE is an acronym for Structured Planning and 0ebttgging. tie* 
iGoIdstjin. ft Miller 1976a, b; Killer ft Goldstein 1976a,b,c]. 

2. Hare accurately, the session transcript is a partial protocol. 
Considerable leverage is obtained by assuming that the dialogue occurs within the 
confines of a small, well-defined respbnse menu: natural language processing 
need not be attempted. We recognize that thorough protocol analysis includes 
parallel examination of \the subject's utterances during the session, eye movement 
data, retrospective at counts, and so -on. Although our sole objective here is 
analysis of the session transcript, we~intend to corroborate our analyses using 
these other sorts of evidence. ^/ f 

^. Miller ft Goldstein [1976b] used a context free problem solving grammar 
to extract the constituent structure of a student's Logo protocol. That paper 
did not, develop the more thorough view of analysis we describe in Book^ *f tlyi 
current report. 

4. PATtl.is designed in [Goldstein ft Miller 1976b]. It has not get *eea 
imp t emitted. The use of present tense throughout this document in describing both 
PATH and PAZATN is for readability only. 

5. For efficiency, some states with similar %>pology are merged, and * 
feW additional arcs are added to provide for such features as iterative control, 
when Recursively invoking the complete system H unnecessary. - 

ft The figure adepts a parenthesized notatioit (which is formally 
equivalent to that used in our earlier papers) to emphasize tfcat predicate models 
are Just LISP Subexpressions which can be evaluated. ^ 

Kt first th^se predicate models will be supplied by the experimenter. 
Eventually we plan to construct a podule to Induce tip model from a hand-drawn 
tablet sketch. A significant undertaking Uself, this would enhance the 
practicality of automatic protocol analysis in the graphics domain. 

7. Generation of pragmatic assertions, representing instances of arc 
predicates is an elaboration of the basic PATN design;- not presented in 
[Goldstein ft Miller 1976b]. These assertions, being directly^ computable by 
examining the ATN's arcs and the semantic .variables, are synthetically redundant, 
but become important when analytic complexities such as irrational bugs and' 
personalized ATN's are considered. 7\ \^ 

8. The example is a simplified hypothetical protocol not involving 
careless errors such as mistypings. In other respects, however, it is typical of 
student protocols for tasks similar to WISH IMG WELL. . 
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9^ The root of the parse trte is shown at the left; the leaves are to 
the right. Some details are not .shown: ellipses are indicated by three periods^ 
For clarity, som semantic information is included parenthetically: SOLVE(VELL) . 

Logo punctuation events are of minor importance in the underlying plan. 
Although used as clues during parsing, they are not included in the strycturtal 
description. In the figure they are shown enclosed in exclamation marks: 
f EOS' >END! . 

Since the order in which subgoals are solved need not mirror their 
execution order in the resulting plan, events need not occur in the parse in 
temporal order. In the figure the events are shown in temporal order, but lines 
are crossed-. 

10. To illustrate the insights gained from the analysis, we use a 
scenario for a question-answering module which performs retrievals and simple 
Inferences over a database consisting of the , analyzed protocol, We are confident 
that the data structures generated by our style of analysis are sufficient to 
support tills type of interaction. However, we have not yet designed the 
question-answering module per se; instead, we have concentrated on isolating the 
relevant, knowledge base. For readability, the questions and answers are stated 
here in unrestricted English; for ease of implementation, the actual system will 
be restricted to a format query language. 

11. It might seem that this definition gf pragmatic annotation is 
inadequate for protocol analysis, since a student may "select the right RjLan but 
for the wrong reason. The SPADE approach handles this circumstance by a separate 
mechanism, personalized ATM' s , to be discussed shortly. For ease of 
presentation, the example uses the expert ATN as the basis for its REASON 
assertions. <; 

12. Although PATN's default solution to the wishingwell task did not 
involve debugging, PATN is capable of rational bugs such as this particular 
Incomplete plan. When solving novel tasks, it is sometimes more efficient to 
plan only for the main steps, with the interfaces being solved by subsequent 
debugging. During planning, 0ATN* notes those points where the plan is 
Incomplete; w|en a bug is encountered, this advice guides PATtf's debugging 
module, DAPR. 

Not all rational bugs are incomplete plans, and not all bugs are 
rational. Overlooking an interaction Tetween subgoals, is another type of 
rational bug. Mistyping* and mispellings are typical irrational bugs; our 
approach to their analysis should be mentioned. The reason for such an event is 
assumed to be the ;ame as th§> reason for the correct version of the event, but 
flaggea by an, additional assertion stating the nature of the mistake. 

13. Of course, one can iho use probabilities to model actual non- 
determinism in the subject's behavior, but we do not consider that possibility 
here, * 
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Book II; Autowutim the Protocol Parting grocow 

V 

3. Introduction to BooTjI 

5.1. Th« ORJ Strict of An«lyi«rt 

5.2. Ovtrvltw of PAZAIH 

r « 

y Book I davalopad tha SPADE notion of protocol analysis as parsing, but 

did not lndicata how parsas ara to ba dar^yad. Automating tba analysis procass 
is dasirabls, bacauss manual analysis is informal , tadious, arror prona, and not 
amanabla to incorporation into compupprlzad tutors^ Hanca, this sacond book 
prasants tha dssign for PAZXTN, an autodhtic protocol analyzar basad on tha SPADE 



thaory. As b*clJfoundy*j>r asssiT&^tba dasign of PAZATN, wa first summariza tha 
faaturas and limitations, if a Wiaafof automatic protocol analyzars davalopad at 
Carnagla-Hallon Unlvarrtty. % - 

| r , " .» , A'* I 

3.1. Tha CHU Sarlas of Aralyzars ■ \ m 

Much ground-braiding rasafrch in automatic protocol analysis has baan 

Nawall 1972], tha 
;ryptarithmatic. 



parformad at* tarnagi^H^lon Unlvmity. PAS-I [Watarman ft Nawi 
first of thraa CHU systsms, analyzaf think-aloud protocols for cr 



PAS- 1 1 [Watarman ft Nawall 1973] is an intaractlva varsion which makas fawar task* 

spaciflc assumptions. SAPA [Bhaskar ft Simon 1976] addrassas tha additional 

complaxitias of samant^cally rich ta^k domains, 
t 

By focusing ta th% cryptarlthmatlc task, PAS-I obtaina^suf f iclant 
lavaraga to^ complataly atftom^a tha analysis ju^y**^. Tha input to PAS-I is a 
transcription of a tap^ racordad think-aloud protocol and its output is a problm 
-behavior graph. PAS*I pparatas 1q four stagas, tha fltst two of which occur 
saquantlally in tima: linguistic ahalysis, samantlc analysis, procasslng of 
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operator group's, and problw behavior graph' generation. PAS- 1 doas not attempt 
gonarallty; for example, the linguistic eoalyzer employs- a kay word greajaer 
oriented to crypterithaetic. Similarly. Process-Colen* is a typical operator. 
' PAS- 1 1 reduces dependency on a single doe* in by requesting guidance from 

a ^ human encoder. Task-specific knowledge is factored injo a separate set of 
rules ; the* domain Independent part of the system amounts to a command language 

or subroutine library to assist a human protocol-analyst. Roving from automatic 

\ 4 

to interactive atfa+ysis nay seen counter to progress. However, this 
methodological contribution allows flexibility to incorporate the experimenter's 
"insight, while still Imposing discipline on the encoding process. We intend to 
construct an interactive analyser as an intermediate milestone - in implementing 
PAZATN. 

SAPA, in cooperation with a human encoder, analyzes protocols in chemical 
engineering thermodynaalcs. By considertng^a doaain rich in background 
knowledge, rather than puzzle problems such as cryptarithaatic, SAPA addresses a 
complex new facet of problem' solving. However, SAPA is highly domain specific. 

* 

For example, SAPA begins the analysis by asking for the form of the energy 
equation used by the subject. Theraodynaaics problem solving is viewed as a 
.variant of means-ends analysis in whidh the energy equation' plays a predominate 

role.. "* 

When implemented, PAZATN will extpnd the automatic protocol analysis 
techniques developed at CHU by complementing their features and imitations. On 
one hand, a PAZATN shortcoaing -- its reatrictiop to a small menu of responses — 
is addressed by the considerable effort CHU researchers have invested in natural 
language front-ends for protocol analysis. On the other hand, CHU has devoted 
lass attention to the investigation of planning concepts, a limitation addressed 
^by the SPADE theory. For example, the CHU theory does not provide e deep account 

t 
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of the origins of planning errors in the PATN sense.. Likewise, a practical 
limitation of tha CHU analysers has been* task> specificity. In designing PAZATN 
m have triad to minimize, task specificity through modular design; this is, made 

a 

possible, in part, by the highly structured underlying SPADE theory. However, 

V 

testing the generality of PAZATN by applying it to several domains remains a 
research goal.' Finally, the elementary progressing world to which PAZATN is 
applied in this paper reseables thermodynamics in that background knowledge of 

the domain plays a significant role in solving problems. 

i 

3,2. Overview of PAZATN 

PAZATN is a scheme for Batching a protocol to a PATN plan derivation; it 
can only understand protocols which PATN can generate. 2 Therefore the analysis 
could be' performedjf in principle, by trying ell possible PATN solutions, 
selecting the first which Batches the data. Since exhaustive enumeration is 
Impractical, a primary consideration is efficient search in PATN's plan space. 
Bottom-up protocol evidence is used for this purpose (figure 1:1). 

PAZATN consists of PATN supplemented by several additional modules and 
data structures (flglre 11:1). This design incorporates three key ideas: 

e 

1. the ust of the cktrt data structure [tUy 1973; Upltn 1973] In 
two distinct roles, both .Involving the need to economically 
store tltemttlve combinations of substructures; 

2. the use of a librtrt 0/ domtia-s>ecific specialists for 
processing events in various syntactic categories; 

3. the use of best first coroutine seercs driven by a separate 
scheduler — with modules communicating by means of the charts 
— to ensure early application of strong sources of constraint. 

•1 

1. Two chart*, One of PAZATN's charts, 'the plaackirt. keeps track of 
subgoals proposed by PATN. PAZATN's second chart, the detecaert, records the 
^alternative ways of associating protocol events with planchart leaves. 
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2. Librtrt 0/ eveat specielisti. Ths synttctic classification of 
^.possible avants for a givan domain results in a highly modular dasign. (ADDCODE, 

RUNCOpE and END ara typical Logo avant typas.) For aach avant typa PAZATII is 
suppllad with an ESP, i.e., a specialist for associating events of that type with 
planchart-laaves. Adapting PAZATN to other problem domains is possible by 
replacing this* library- 

3. «est first coroatt«e^fearc*« • PAZATN receives the model and protocol 

as input. Tha nodal is a formal statement of the problen as shown in figure 1:7; 

.the protocol is~a^ list of events as shown at the top of page 1.19. PAZATN' a 

output Is a SfADl UUrprtUtio* of the protocol as described In Book I: • perse 

tree augmented by semantic and praptatlc annotation. At any given tl»e during 

analysis, several jartiai interpretations will be acttre. The outer loop is a 

scheduler which allows each active partial interpretation to examine ope event 

* * 

per cycle. For a given interpretation, events Are processed in a single left-to- 
right pass. At the end of^apOrcle the active set Is ri-cboseo. This repeats 
until at least OM^fnterpretatloo has processed the final evetrt. 



OM^t 



1 / 



, Analysis of a protocol proceeds as follows. First PAZATN requests PATH 
to generate Its aost plausible plan on the basis of the ^odel alone. This plan 
is inserted into the planchart. Next, protocol events are examined one by one, 
•etching thea with subgoals In the PATN^plan. Each Mtch is recorded In the 
datachart . v \ 

If an event is encountered for which- no plausible Batch can be found, 
PATN is asked to generate Its next aost plausible plan, now potentially 
considering the nature of the mismatch as well as the model. The planchart is 
extended* by inserting PATN's next plan. Those subgoals which are common to both 
plans share the tmm structure in the chart. 
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4 "** 

** *** 

If an event Is encountered for which sort than one plausible match can ba\^ 
found , tba datachart records each such pairing in similar fashion. Each of these 
is then allowed to continue examining protocol events according to the best first 
scheduling algorithm. 
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8. Slw 1 -""^ tic Parsing of the Ex arole Protocol 

' . •' ■ > 

6.1. Preliminary Generation of Expectations 

6.2. Modifications Based on Bottoa-up Evidence 

6.3. Soae Informal Observations 

This chapter Is a hand-slaulatlon of PAZATN on the WISHING WELL protocol 
Introduced In Book I. -Various components of PAZATN are Introduced as they are 

needed. Subsequent chapters provide details regarding these components. 

V 

6.1. Prellalnary Generation of Expectations 

The protocol parsing process Is Initiated by executing (PAZ ATM 
VISMItQiELL W), where WISHING WELL Is the nodel and WW the protocol. The Initial 
answer library Is assumed to contain procedures for ffci ANGLE and TREE. 

Before PAZATN examines the protocol, PATN examines the model. Since 
WISHING WELL Is not in the answer library. PATN determines that an identification 
plan la not viable. Both decomposition and reformulation are possible, since 
.they are applicable to any model. ^ ^ 

PATN can determine, using lookahead, that reformulation results "Tn an 
identification involving TREE; for this particular protocol, this quickly leads 
t6 a successful parse. However, reformulations .rapidly expand the search space, 
so PAZATN adopts a conservative approach Jo reformulation: decomposition^, which 
lead to a straightforward solution are preferred unless protocol evidence 
indicating reformulation is discovered. . Consequently decomposition is predicted, 
with three main steps: ROOF, POLE, and WELL. But since the decision is 
uncertain, a demon jfoce*«re 3 is created to handle the possibility that 
decomposition falls to parse the protocol. 

The model is examined for interactions. None are detected, so a linear 
decomposition into subgoals is expected. However, since required connection 
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points occur at tbo nidpoints of sidas of WELL and ROOF, a non-linear subgoal 
decomposition sight be used for efficiency, to avoid retracing. As a result, two 
down procedures are' created to check for WELL or ROOF sides being accomplished 
in 1 two steps. Such a plan lawless likely for ROOF which can be identified with 
the existing TRIANGLE. 

The transitive ABOVE predicates suggest a sequential plan utilizing 
either the order, {ROOF POLE WELL), or the order, {WELL POLE ROOF). There is no 

basis for selection. Hence, PATN follows a principle of least commitment, 

J 

predicting the disjunction of the two invocation orders. 

This application of the principle of least commitment is accoaplished 
using a cAert* of alternative plan .derivations called the planchart. Tho 
planchart is similar to an AMD/OR toal tree but involves a variety of node types 
and shares substructures economically. Figure 11:2 illustrates how the twd* 
equally likely sequences are represented in the planchart. As PATN generates 
predictions, the required bookkeeping is performed by expanding this plapchart. 

Since the tain steps for the two sequences are identical, they provide no 
evidence regarding ordering. The interfaces provide the critical evidence, sp 
PATN solves the interfaces for one order, {WELL POLE ROOF). Because the choice 
is arbitrary, another demon is created to expand the {ROOF POLE WELL) order in 
case the interfaces fail to 'match. Except that TRIANGLE is already in the answer 
libraryf PATN has predicted the protocol of figure ^1:8. 

Besides predicting PATN's default solution, three arbitrary choices have 
been flagged as likely failure .points. If the specific discrepancy pattern 
monitored by one of the three corresponding demons is detected, that choice will 
be reconsidered. If non-specific mismatches are encountered, backup to other 
decisions will occur in the usual way. Note that most choices are not arbitrary 
and have aot been flagged. (This helps to avoid the usual inefficiency 
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i SOLVE' (WELL) 



SOLVE (INTERFACE 1) 



* SEQ 



SOLVE (WW) . CON J 




1 — SOLVE (WEED— f 
FIGURE 11:2 SIMPLE PLANCHART FOR ALTERNATIVE ORDERS 
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associated with pure backtracking control structure: "railing In all possible 
ways.") 

At this point, control passes to the betton-up analytic routines. 

6.2. Modifications Based on Bottoe-up Evidonco » 

PAZATN now attoapts to interpret tho first protocol event in a aanner 
consisVej^ltb PATII'a default solution. 

E01 TTO WELL 

E01 is classified as a TO event — Logo punctuation beginning a procedure 
definition. The event specialist for TO events is called upon to assign the 

> 

event to soae expectation. * 

PAZATN does not use aneaonlc cloest and no significance is attached to 
the student's "particular choice\of the nana WELL. 9 The TO specialist examines the 
.plenchyt (figure 11:2) 'for**cttididate subprocedures.' There are expectations for 
the top level (WW), WE^L,L, POLE and the two interfaces. The default solution 
order is top-down, so E01 is assumed t\L_siacfr / WW. However, solution order is ao. 
variable that other interpretations are plausible. Conseanently the 
interpretation splits into separate analyses for each. 

Whereas the planchart is used to keep track of alternative expectations, 
a second chart, the datachart, is used tb keep trick of alternative associations 
between protocol . events and expectations. PATH expands the planchart; PAZATtl's 
evert interpreter expands the datachart. At any given tine, sone of^the partial 
interpretetions in the datachart are consideredito be active; the rest are A***. 

x For expository purposes, we wlllassuae that only one 'partial 
interpretation is active at a tine. Rather than pursuing several alternatives in 
parallel, we will nerely record then in case the need to back up H,,,c * 
after the split is performed, E01 U assigned to be the TO for the top level 
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proceftt^e, .WW. The parent node of this interpretation, a generator for 
alternative interpretations of E01, is hung. -■> 

% — . With E01 assjumed to st*rt WW, E02 is now processed^ 

'' 1 ' "* * • • ' - 

E02 , ^lt REPEAJM [20 30] 

,«£*'- t ' - 

is does not match the expectation for a definition of the top level WW 

^procedure, Therefore .""backup' occurs to the nost recent split (at E01). - The on^y 

alternative that can accdiint for E02, that E01 ,is the start of WELL, is/activated 

(fl8»J(^:3). " * ~ 

The protocol matches this new interpretation through EOS. 

4 ;> EQ3* >20 FORWARP 100 / » 

E04/ >30 -RIGHT 90 , . ' 

• E05 5 >END > 
Aabiqufty arises* at E06. ■ ' J 

E06 >TO WW ^ 
Since HOOF .can be Identified with TRIANGLE and WECL has already been found, this 
mus.t be WW, POLE or an interface. The POLE and interfaces are apt to be solved 
by in-line cdtfejf furthermore, ^toprdown order is the default preference. I^ence, 
. although E06 causes a split, WW is clearly chosen as the active interpretation.. 
pp)%Kt,.' E07 is examined, f 
k ' E07 >10 TREE 

- Rather than matching 'WW's expectations for a setup or#call to WELL, E07 
■etches the discrepancy pattern for two active deaons. " 'One demon represents the 
possibility the$ ^e ,{R0DF POLE WELL} order was used; this would require TREE to 
be the sotup for RO&V. The other demon represents a potential reformulation 
involving TREE. This second tiettn is* highly Ipecif^ for this evidence and is 
therefore triggered. Control returns "to PATH with a request for, a ^formulated 
model in which TREE is a subgoal. Jj [ 

PATH regroups ROOF and POLE into TREE, and then expands for a solution to 
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\ DEF'N 



EO 2= REPEAT^ 

^ ACTIVE 



FIGURE 11:3 DATACHART AT E02^8F WW 



5i; 



V • 



coTa**: 



Protocol Analysis 2.13* Wllor * Gold*toin 

the revised model. When the sequential reflnoaVnt loop Is reachtd, tht 

v 

(TREE WELL) Invocation ordor is chosan iamediately on tha basis of known protocol 
data. : This naad not hava baan PATN's cholco fro* a problem solving point -of 
view: this dadsion Is forcad by tha bottom-up evidence. Figure II:* .shows tha 
modified planchart. 

After PATN has processed the reformulation request, E07 can be 
accomodated, as shown in the datachart of figure 11:5. E08 is now examined- An 
interface Is expected. 

E08 >20 WELL \ 
WELL is Mtown/to bo a previously solved ■* instep, violating tKat expectation. 
This Is the standard pattern for an incomplete plan: an/interface is. expected 
but instead the jiext .mainstep Is found. A demon for Incomplete plans la always 



active and is triggered by this situation. It passes control to DAPR which 

t 

generates debugging expectations. 

Each remaining event Batches a DAPR expectation. 

E09 >END 

0 EH) >?WW 

Ell >?EDIT WW 

E12 >13 RIGHT 90 

E13 >13 FORWARD M 

EU >I7 RIGHT 150 " 

E15 >EHD « v 

¥ * 

Hence the parse succeeds. Figure 11:6 shows the final planchart and datachart, 

with aarked nodes indicating the parse tree which is returned. 

* *, 

6.3'. Soete Informal Observations 

We have hand-simulated PAZATN on about a half-dozen hypothetical 
protocols. ' This informal exercising of the design has led us to a number of 
tentative observations regarding PAZATlFs capabilities, ^e question which 
arises Is PAZATN 's flexibility to handle alternative solutions. Wo are confident 
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that the following types of variation can be handled: 

» 

' 1. subprocedures versus' in-line code; 

2. incomplete. plans where interfaces are solved by debugging; 

3. incorrect plans where interactions are overlooked by the student 
(but known by PATH); , 

4. pe nutations of invocation or solution order; 

* , 

5. standard reformulations (regrouping, generic-explicit 

conversion); 

"-I 

•.unnecessary nonlinear deconpositions (accidental or for 
efficiency); • - 

7. non-standard default parameters (FORWARD 75 as the basic unit); 

8. simple" forms of equivalence (BA(X 100 versus FORWARD -100); 

9. common errors such as mistyping, or omission of a line number. 



1 



tboTther hand, the following types of variation pose problems for PAZATO: 

1. interleaving of lines from different procedures if errors also 
occur in that a procedure is accidentally edited; 

2. unrecognizable reformulations due to gaps in PATN's knowledge; * 

3. deliberately obscure code, or£jonde involving many needless 
operations; ^ 

4. equivalence transformations resting on subtle domain theorems; 

* H fully general recursion including hotorarchical procedure calls. 

Another observation concerns PAZATN's efficiency. For the simple 
protocols we have considered, after only a few false starts, PAZATW latches onto 
m correct set of expectations regarding the student's overall plan. After that 
point (which we would place at E08 for this protocol) interpretation of the 
lining events proceeds without incident. 
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7. Organization of too PAZATU Protocol Pirnr 

* 7.1. Tht Planchart 

7.2. Representing Interpretations 

7.3. The Datachart 

7.4. Incremental Planchtrt Expansion --_ 



7.5. Markers and Harker Propagation 

7.6. Pi^e process ing 

7.7. The Event Classifier 

7.8. The Event Interpreter 

7.9. The Event Specialists 

7.10. The Scheduler 



In generating potential prato^^tffpr^tlons, PATN Is guide! not only 
by synthetic evidence derived froa examining the aodel, but also by e*fifti£ 
evidence derived from previously examined protocol events. If previous events 
have established that the student Is pursuing a particular subgoal, then PATH 
will propose candidate solutions for that subgoal, even If It Is not one which 
irises* In PATN's preferred plan. Likewise if previous events have established 
that the student is pursuing a particular invocation order, then PATN will use 
that order In creating Interfaces, even if another sequence leads to simpler 
interfaces. This sensitivity to the student's plan is accomplished by adding 
additional predicates to PATN's arcs which access assertions in the current 
partial Interpretation. 

This chapter presents the pajor PAZATN modules needed to use PATN in this 
analytic role. Chapter eight refines the discussion presented here. 

7.1. The Planchart 

PATN is an intensiontl representation of the plan space; there are a 
nuaber of reasons for needing an extension! representation of the ATN process. 
Consequently a complete trace of PATN's operation, the planchart, is aaintained. 
One reason for creating ibis data structure is to avoid repetitive calcu latitats, 
tftt additional uses for the planchart will appear in the course of the 
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discussion. (Figure 11:4 shows*an example plan chart from tbt analysis of WW.) 

The pi an chart Includes not only plans, but nodts of other types such as 
debugging apisodas. As its name suggests, tbt pi an chart is a cAart [lUy 1973; 
Kaplan 1973], a network which compactly represents alternative combinations of 
subexpressions. This economically represents nPATM's partial solutions and their 
hierarchical annotation. Rather than generating the entl/e solution space at 
once — which would he impractical even if it happened to be finite — PATH 
expands this plan chart incrementally as additional possibilities are needed by 
the analyzer. 

- — -Looking upward from a given let/, the plancbart resembles an AND/OR goal 
tree. However, there are a greater variety of node types, rather than Just AMU 
and OR. This allows tjie pTanchart to represent such concepts as whether 
conjunctive subgoals need to be accomplished in a specified order, or tdiether any 

t m 

orders will do, allowing a greater variety of potential interpretations to be 
expressed parsimoniously. 

The analysis process is closely tied to modifications of this data 
structure. In particular, the structural description assigned to a protocol 
corresponds to a pathway through the plancbart starting from the root — the top 
level SOLVE node to the individual protocol events corresponding to a subset 
of the leayes. The semantic variables and prafputic assertions are generated by 
PATH along with the parse, and are attached to the corresponding plancbart 
nodes. • Consequently, the structure building actions of the protocol parser are 
, performed entirely by PATH. 
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• » 

7.2, Roprosoatlno Interpretations • 

An taterprotf tto* 0/ ft evext is represented as an assignment of that 
•vMt to a loaf of the planchart (figura 11:7). Similarly, an ttierpreUtio* of 
t*e protocol is a complete association list of such avant assignments. A fitrtitl 
latarprt tatloi is an Association list containing assignments for a subset of the 
•vonts im the complete protocol. 

Because of the chart representation of plans, individual events can be 
assigned to a single leaf but remain ambiguous as to which plan they belong to* 
The assignment captures exactly what can be concluded from the event: no more 
and no less* All possible interpretations consistent with the data are carried 
along* 

In order to be assigned to a given leaf of the planchart. it is *ot 
necessary for the protocol event to match identically. Data events are converted 
to canonical form before assignment , so that equivalent forms (e.g.. LEFT 90 and 
RIGHT Z70) are not distinguished. Non- equivalent assignments are also possible, 
representing the analyzer's Judgment that the protocol event was intended to 
match the planchart leaf but contains either errors v such as mispellings or 
mistypings. or different default parameters where a ranpe of values is 

acceptable. 

* 

7.3. The Datachart 

, A partial interpretation splits when it proposes lore than a single 
planchart assignment for *n event. Some method for keeping track of the 
analyzer's alternative partial interpretations is needed. U should take 
advantage of the fact that, following a split, the event interpretations prior to 
that split remain the same: the conmon ancestry should fee preserved. Ideally 
lntsrprststlons which sbtm or tvtots both Hfors snd^s/ter • split should *h«rt 

/ 
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Planchart 



i— REPEAT 4 



SOLVE (WELL) — PLAN — DEC — REP — 



SEQ— 



FORWARD 100 



-Protocol 

*E02 REPEAT 4 [20 30] 
E0 3 FORWARD 100 
E04 RIGHT 90 



l— RIGHT 90 



E03 has been assigned to the planchart generic side for WELL. 



FIGURE 11:7 INTERPRETING AN- EVENT 
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the same representation for them; this is calltd a jot*. 

Tha datachart jerVes these functions. Like the planch art, the datachart 
is a chart, so that it can economically store coaon Substructure. Suppose that 
two interpretations have identical assignment* for the first H events, and then 
split. The J^rtit corresponds to a single node having two descendants. 
Assertions corresponding to the shared part of the in Urpre tat ton #re 
automatically inherited fro* the parent node (figure 11:8). ^ " 

Whenever a low plausibility event assignment occurs th$ following actions 
are performed : 

1. An assertion is added at the current node, indicating which event 
assignment is about to be nade. This ensures that the sake 

. possibilities will not be repeatedly pursued. 

• it 

2. A new node is sprouted, which will inherit prior assignments frof 
the parent node. This ensures that changes which reflect the 
uncertain assignment will not affect the state information of 
the parent node. 

3. The uncertain assignment is performed at the new node. The 
normal operations associated with event interpretation 

* (described below) are carried out. 

4. The new itode is placed on a list of NEV partial interpretations. 
This ensures that it will be scheduled for at least one cycle of 
further investigation. 

5. The parent node \% re-examined to determine if additional nodes 
should be sprouted representing alternative event assignments. 

y If so, the above sequence of operations is carried out for each. 

When no further alternatives seen worth considering at the 
present time, the parent node is placed on a list of HUNG 
interpretations. 

This technique has tie feature that it is not necessary to explicitly 
list all of tha possible alUnfikive interpretations for a given event.' After 
sprouting, the parent node no IftngeTrepresents a single partial interpretation, 
but an indefinite number of implicit alternatives to its current offspring. Even 
■ftar It Is HUNG, the parent hod, contains the necessary state information to 

<>6 " * . 
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\ 



SPLIT 



E02=? 
HUNG 




"E07=SETUP WELL 



E01=WELL DEFINITION 



i ' 



> E02=REPEAT 



E07=TREE "CALL 



E08=WELL CALL. 



The hypothesis that E01 starts the definition of 
Sell can be "seen" from thejpode for E 8 Two 
possible explanations of EOT can Aso be seen. 

•I • 

FIGURE II ;8 INHERITANCE OF DATA CHART ASSERTIONS 
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Otfitrttt Additional possibilitits if tbtst art tvtr nttdtd. 

7.4. Incremental Planchart Expansion * 

Considtr the Situation in which an active partial interpretation cannot 

f > 
find an acceptable planchart assignment for its next event. Two cone luS ions are 

possible: either (a) the current partial interpretation is a dead end, and 

should be moved to the HIAtt ltst; ~or (k) the current partial interpretation if 

viable , but the pldnchart has not been expanded sufficiently to accdunt for the 

current data. % ^ ~ _ 

i 

This decision is crucial. If RAZATN is too aistrly in allowing plAnchart 
growth* an event could be mis-interpreted as a deviant version of an existing 
leaf, whert only slight growth would have allowed it to match a new leaf exactly, 
fat if PAZATN is too eager to expand the planchart, the number of irrelevant 

solutions^roposed could 4>e enormous. 

> » 

This decision is also very difficult, being complicated by the 
circumstance that data events need not identically match planchart leaves: they 
can differ because of postulated bugs or variant but acceptable parameter values 



(such as scale factors). 



\ 



^bur techniques are geraane to ttyis decision and its complications. 

1. Protocol events are converted to a canonical form. This allows 
for handling sinple forms of equivalence such as FORWARD -10^ 

versus BACK N 100 . w 

• * 

2. Staodard Spelling correction procedures 7 , are applied to 
unrecognized protocol events » using the fringe of the planchart 
as a dictionary. This allows for handling simple mistypings and 
mispellings. 

3. A hash coding scheme uses the critical terms of an event (e.g., . 
the FORUAAD, t^t not the 190) as keys. 1 This allows acceptable 
variants <*S events (t.g.„ thost difftring only by a seals 
factor) to bt locattd. 

4. The neighbors of a planchart leaf p>ovide expectation* which 
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*inf luence the plausibility of event assignments to that leaf 
The* next section describes a schqpe for generating these 
expectations. 

• * e» 

* * • • * * 
\ 7»S. garters and Kafkar Propagation 

A Barker pfppagation ^chnique helps in deciding whether to expand the 
plaiichart by proyiding a precise representation foK expectations. . Markers also 
detfcnnine the final protocol, parse by selecting a pathway through the planchart. 
Assigning i protocol fvent to a plandhart leaf marks that leaf. Thcee types of 
Barker* aroused: (1) a standard Barker for events that Batch identically or 
differ von ly in a* flexible parjaeter value; (2) a diltj/iguished Barker for top 
— - down DEFINED plans prior to encountering the body of the subprocedure;^ ,and (8) a 
distinguished Barker for deviant events Involving Bistyping* or siallar errors. 

4k constituent is txpectrt to the' extent to which finding it results In 
propagations, where propagation through the planchart is characterized by rules 
such as: * - » . , 

4 * 

HPR-D.I3J. If the parent" of a» marked node is disjunctive (i.e., 
split), the parent is Barked; 

•HPR-CONJ. If the parent of a Marked node is 4pAjunctive (e.g., 
v every sibling of thr marked node lt*mar»ed, the parent is 

\ ^ Marked. ' ' \ 

„ • . . . ' « 

The rules shown here are incomplete. Top 'down DEFINED plans, for 
example, receive special treatment to ensure that after completing a 
smperprocedure the expectations for' its subprocodures remain in effect. 

- , As an example of the use of these rules, consider a bottom-up DEFINED 
plan, whOre a subprocedure is first defined and then called by a superprocedure . 
After the subprpcediire definition has been encountered, its use bftfcose 
super procedure is expected. The planchart would contain a Barked SOLVE hode for 
ttitf 'subprocedure and an unBarked^n node for its use in the other procedure. 
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both dominated by in unordered conjunctive "ft" node (figure 11:9). The USE Is 

expected because narking its node would result in a propagation at least as far 

* ° 

as the SOLVE dominating the DEFINED node. 

Suppose that an expectation (such as the bottoa-up DEFINED plan exaap'le) 

fails to be satisfied after aany events. possibility is that the partial 

interpretation which' expects it is completely wrong, and should be abandoned. A 

f 

second possibility is that the partial interpretation is basically correct, but 



the student has accomplished the expocte<Koffect in an alternative Way (e.g., 

incorporated t^he sub procedure's definition in-line instead of calling it as 

expected). This second cue turns off the expectation, since it becomes 

— . 6. 
dominated by a marked node (figure 11:10). 

A A third possibility is that the student accidentally left out the 

relevant line of code. . This is detected when protocol events indicate that* the 

. episode is f inisbedg In the Logo world this corresponds to encountering the END 

Statement for £he supecprocedure, END statements force propagations even when 

some expectation* are not satisfied; but the plan is flagged as incomplete, 

debugging expectations are generated, and thfe plausibility is lowered. If the 

debugging predictions are then confirmed, the plausibility is restored and the 

expect at i^r considered satisfied. » 

Markers, as a representation for expectations, provide evidence regarding 

xhe plausibility of . interpretations, jhich is especially useful when plan chart 

expansion is under consideration. Typical plausibility guidelines include: 

« »* 

m + — \ 

T-HA-l. Event assignments that result in longer chains of ^oM||ioms' , 
are more plausible than tljose that result in shorter fihal^ of 
propagations or none at all. / \^"\ 

PLQ-2. Interpretations that leave few expectations unsatisfied an 
. plausible than those that leave many expectations unsatisfied. 



f • : • • . • 7(J 
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1 
# 



? ? 

, « .SOLVE ( *WELL* ) • t -DEFINED 



1 



(already 
« sparked) 

-solveu-WeI'L) — ■ 




"USE (WELL) 



A use of WELl/is expected because it would cause - . 

>tfie propagation's shown as^'s. v > 

♦ „ * » 

FIGURE II; 9 DEFINED PLANS; AN E*XAMPLE OF PROPAGATION AS EXPECTATION 
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Planchart 



x ' "it 

SpLVE(WW)— • • «SEQ 



Protocol 



TO WW 



X 

r-SOLVE (WELLJ 



SOLVE ( *WELL* )- 



i-USE(WELL) 



_solve{well4|— 




in-line code for well 



END 



• • - * - <; 

, This wse of WELL is no longer expected, 

• since.it is now dominated by a marked 

: s node. . ^ m • . h 

FIGC^' 11:10 i EXBECTATIoSMKmJCELLED, DOMINATED- BY MARKED NODE 
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7.6. Preprocessing • 

PAZATN includes a preprocessor" which performs four functions. 

J^, 1. Low level syntactic anomalies such as typographical errors 
corrected using the RUBOUT and BREAK keys are filtered out; 
only the corrected versions of such events are examined. 

• 2. Low level segmentation clues are noted. For example, with raster 

scan TV TURtLES [Lieberman 1976] global connectivity of vectors . 
is readily detectable and -suggests a segment boundary. 

3. Timing data are collected. This information may be of value in 
testing psychological claims, and in some instances the- 
rfaiisiblUty of an interpretation depends upon the .elapsed time 
bjXraen type-Ins.* 

4. The primary syntactic class of each event is recorded to avoid 
recomputing it under each interpretation. Classification is 
performed by a separate module which can be re- invoked if the 
primary class is later called into question. 

7.7. Tht Evant Classifier j * 

The evelt classifier, one of the few PAZATN modules which must 4 bo 
redefined for each domain, contains the syntactic knowledge necessary to 
distinguish various domain*spfclf 1c event types. For the programming world, the 
event types Include RUN avents, EDlTavents, and so on. In assigning an 
.interpretation to an event, a variety of semantic and pragmatic evidence is 
ultimately considered *f PAZATN, but the event classifier is resected to 
syntactic evidence.' 

The event classifier can be invoked in three modes. Normally it is 
invoked by th< praprocessor, with its Input an event and its output^the event's 
•rtmary syntactic ciess; for most events, this is sufficient. In the second 
■ode it is invoked by partial interpretations which .question the primary 
syntactic dais, with a specific alternative class being considered. Here its 
input is an event and a class name; lis output is a numerical score summarizing 
the syntactic evidence supporting the alternative class. In the third ^mode the 
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Classifier is invoked by P^rtyi interpretations which question the priMry 



Syntactic class but with no specific alternative class under consideration. An 
exhaustive rank ordered list of categories and scores is returned. * 
Event classification will be performed /using straightforward pattern 
■atchlng. No further details are given here. 

* 

% 

7.6. The Event Interpreter 

•** 

The event interpreter is responsible for category independent operations 

4 

of event interpretation. This includes the node sprouting sequence described In 
the datechart section , the processing required for marker propagation , and the 
plausibility computations. The rationale for grouping these activities ^is 
modularity: they are required for every category 6f e*ent interpretation. 

The event interpreter is PAZATN's inner loop. It is invoked by thp 
scheduler with two arguments: a partial interpretation, and a protocol event. 
It attempts, lit cooperation with one or more event specialists, to account for 
the protocol event in the context of the partial interpretation, T^\% can result 
in the creation of additional (descendant) partial interpretations. Control 
returns to the scheduler when event interpretation is complete. 

7.9. The Event Specialists 

A collection of domain specific event specialists (ESP's) are responsible 
•for category dependent operations of event interpretation. Each specialist 
contains the requisite knowledge for analyzing events of a particular syntactic^ 
type. The event interpreter invoke^ pn ESP, in the context of a partial 
Interpretation, with an event and an lapllclt assumption regarding Its syntactic 
category. The specialist is free' to assign any Interpretation to the event which 
is consistent with the category; * 

If the event specialist does not return with a sufficiently plausible 
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event assignment, the tvtnt Interpreter than consldars tha possibility that tha 
syntactic category postulatad for tha avant Is incorract. Whenever an ovont Is 
lntarpratad as being In error, axpactatlons for diagnosis and repair are 
generated by DAPRjrt tha request of the event Interpreter. 

E3P's use a dtcision tree organization to factor tha analysis into 
severer casts Each cast represents an assumption about intent;* if tha 
assumption is uncertain, the state of the interpretation is preserved by 
sprouting a new detacher t node. This is exemplified by the Logo ADDCODE ESP, 
whose flowchart appears in figure 11:11. 10 

The ADDCODE classification assdaes that the current event is intended to 
add a new line of code to deprocedure definition. Hence it must be determined 
whether Logo is actually in definition mode. If not, the following event* will be 
an error message. If the ADDCODE assumption is correct despite the error, the 
current event will be repeated after a TO event. 

Looktkei* is required to assign the current event to be an erroneous 
version of a later event. However, in a real tile tutoring application, the 
later event might not have occurred yet; moreover, processing more than one 
event would exceed the scheduler's resource allocation. This dilemma is resolved 
by creating a deaon to represent the current event assignment. The deaon will 
fire when the future event is assigned, assigning the now current event to be a 
deviant version of the later event. 

In the case where Logo is In definition node, ADDCODE branches to one of 
the following subcases: (§) the added code is a turtle primitive; (b) the added 
code is a Logo control statement (such as a recursion or iteration -line); (c) 
the added code is a call to abuser procedure other than a recursion line. 
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ASSERT -ERROR 



YES 



RETURN DEMON 
PROCEDURE 




ASSIGN 

AS 
IN-LINE 
CODE 



I The flow of coritroi is interrupted here 
if plausibility falls below threshold. 

FIGURE II til FLOWCHART FOR. ADDCODE ESP . 
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v ... 

7.10. Tho Scheduler 

His scheduler's task Is to canst partial. interpretations which hava a 
reasonable likelihood of succeeding to Mk« progress-, and prevent thoso that ara 
likely to fall froa consuming "valuable rasourcos. Operationally this Man* 

j 

/ 

Ivlng PAZATN through t tost first coro«ti*e Jet rck *f the space of partial 
Interpre tattoos. 

The search Is accomplished by Maintaining three' lists of partial 

interpretations: NEW, ACTIVE, and HUNG. The scheduler cycles through the ACTIVE 

\ v 
list, allowing each it on to process one protocol event. Then the plausibility of 

each aodifled interpretation is recomputed, and the ACTIVE and HUNG lists are re- 
chosen. NEW interpretations, wnlch result fron the splitting of ACTIVE 

» 

interpretations on thaf previous cycle, are then «oved to tho ACTIVE list, 
guarantying then at loaiH one quantum' of processing. Tho plausibility of a 
pertlal interpretation increases with aach additional event accounted for. (This 
acts to decrease Us rtlatlvs plausibility of oldtr HUHG interpretations* ) 

v Ttjls procsss contlnuss until at Isast ona ACTIVE Interpretation has 
processed the last iaput event without unsatisfied expectations. If the first 
successful Interpretation Is not sufficiently better than every other candidate , 

som of^the bettW alternatives are pursued until they £e com Implausible or 

\ x 
determine that the protocol aay successfully be Interpreted ln^aoro than one way. 
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8. Refining the Protocol Parser 

8.1. Loot ahead 

5. 2. Least Commitment 

. 8.3. Differential Dfitposls 

Our ba»ic protocol parsing scheme Is to generate expectations with *ATN 
and then try to matcH these expectations to a protocol. This process is refined 
by several techniques which have enhanced the effectiveness of problem solving 
and language proenssing progress: looUkeai (e.g.,. [Aho & Ullman 197Z]) 9 least 
commitment (e.g., [Sacefdoti 1975]) and iiffercttial diagnosis (e.g., 
[Rubin 1975]). 

8.1. Loofcahead 

^ , LookakeQi and least cowritmeat are Ablated search strategies designed to 
avoid premature decisions based on inadequate evidence, which can result in 
needless backup. Lookahead consists ef briefly examining subsequent input events 
before interpreting the current event. 

PAZATN can accomplish a limited form of lookahead by using demon 
procedures to represent event assignments. When the current event assignment 
depends upon a future even^ assignment, a demon is created which will complete 
the current assignment when the missing evidence from the future assignment is 
available. 

8.Z. Least Commitment 

Variability in solution order, exemplifies the need for avoiding premature 
commitments. PATN always defines the top level plan before expanding 
subproblems, representing, strict top-down problem solving (figure 11:12). but 
huaan programing Is rsrtly this unlfora. Whan tha naod for a particular subgoal 
has baan aStablishsd, it say. ba axpandad iaaadiataly, prior to collating tha top 
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SOLVE (WW) 
• 

. / 

SKQ 

SOLVE (* ROOF*) 
PEFINED 
t USE 

SOLVE (ROOF) 



SOLVE (*PdLE*) 
DEFINED. 
.USE 

SOLVE (POLE) 



SOLVE(*WELL*).' 
DEFINED* 
USE 

SOLVE (WELL) 




?TO WW 
>10 ROOF 
>20 POLE 
>30 WELL 
>END 

?TO ROOF 



>END 



TO POLE 



>END 



?TO WELL 



>END 



FIGURE 11:12 A TOP-DOWN EXPANSION FOR WISHINGWELL 
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ltvsl plan, representing battoa-up problea solving (figure 11:13). 

*. * * ' 

4 Least coaaitaent helps to ainiaise misleading aisaatches between 

- planchart and, protocol resulting froa different solution orders, this is 

•ccoaplished by using ujnordered conjunctive *lt* nodes in the planchart. Thus 

'whan DEFINED plans are e^andeo to USE 4 SOLVE, the SOLVE any occur prior to the 

USE with no loss in plausibility. 

The least coaaitaent policy is applied to variability in invocation order 



/ 



as well. When, as was the case with WW, sore than one invocation order is 
acceptable, the planchart is split. This parallels the use of procedural nets 
[Sacerdoti 1975] to avoid overspecifying ordering constraints (figure 11:14). 
The chart data structure allows the ambiguity to 1 be represented without 
significant additional cost: if the mainsteps are identical for both orders, 
then two copies will not be stored. v 

Despite' its virtues, though, least commitment could be* overdone, 
resulting in so large a disjunction of expectations that 4© guidance would be 
obtained. PAZATN strikes a balance between overcounting itself and refusing to 
take decisive action: it avoids arbitrary choices in the course ef a given 
decomposition strategy, but adheres to a given formulation of the model unless 
required to change it by specific bottom-up evidence. 

8.3. Differential Diagnosis L* 

The use of demon procedures to implement lookahead was discussed earlier. 
Another use' of demons is to perform differential diagnosis, using highly specific 
clues to distinguish between similar competing interpretations. The Irimary 
application of differential diagnosis demons is to the choice between assigning^ 
an event to one of an existing disjunction of expectations, and reformulating the 
problem descrijftion in response to bottom-'up evidence. 



so 



ERIC 



Protocol Analysis 



2.37 



Miller & Goldstein 



\ 



soia^e (Ww) 



/ SEQ 

SOLVE ( *R~OOF*T 
DEFINED ' 



SOLVE (ROOF) 



USE 



SOLVE ( *POLElt) 
DEFINED 

SOLVE (POLE) 



USE 



SOLVE (*WELL*) 
DEFINED 

SOLVE (WELL) 



r 



USE- 



?TO ROOF 
>END S 

— \ 

?TO POLE 
>END 

?TO WELL 
>END 



?TO WELL 
>10 ROOF 
->20 POLE 



->30 WELL 



>END 



FIGURE 11:13 BOTTOM-UP EXPANSION FOR WISHINGWELL 
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FIGURE^ II; 14 A PROCEDURAL NET FOR BUILDING A TOWER. 



AFTER CRITICISM TO kESOLVE CONFLICTS 
[BASED ON SACERDOTI , 1975, p. 15} 
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^illustrate this, we present one example of a complementary ptir of 



demo* traftote**., -These templates can be instantiated to realize inferential* 



diagnosis behaviojr in specific situations^ 



A>. If the current cod* segment lor its picture) Matches a 
disjunctive subset of the current expectations, select that 
subset. 



B. If no expectation matches the current code segment (.or Us 
picture), 9<mslder a reformulation using fhe^segmenf '* effect as 
a subgoal. " 
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•9.1. Recapitulation 
9.2. Implementation Plans 

9.1. Recapitulation ^ 

In this report we hava investigated tha problem of. analyzing elementary 
probity solving protocols. Tha rasult of this investigation Is tha dl^ign for 
PAZATN a domain independent protocol parsing scheme, which was applied to the 
Logo graphics programming domain. Coupled with the, Logo ESP's, the design was 
^ sufficiently well-specified tjiat PAZATN could be hihd-sinulated for a sl ^j£ 
example with encouraging results. The' foundation for tfie approach was SPADE, a* 
lingu^|Uc\theory of design in which problem solving is viewed as a structured 
process of planning and debugging. This led us to the ^inition of an 
• interpretation as a parse tree augmented by semantic and pragmatic annotation 
'associated with each node. * ' ^ • 

' * ' A key ingredient in the design is a machine problem solver called MTN. 
> PATN employ!^ an ; augmented transition network to represent fundamental planning 
concept's. Including techniques <yf identification, decomposition, and 
reformation .7 Considerable Jeveratfe is obtained from PATN's ability to generate 
successively less preferable solitfioji paths, by a series pt pragmatically gtiid^d 
^planning decisions, as well as 'from PATH'S iharactlrizatiofT of certain bugs ma 
■ ♦ errors in these planning chores. ^ 

v We Yound an analogy to computational linguistics to be fruitful, 

providing insights into data .representations and search strategies which arm 

J * 
characteristic of research in syntactic analysis [Kay 1973; .Kaplan*. 19733 and 

speech recognition (e.g., [Lesser et al. 1975; Paxton ft Robinson 1975]). For. 
* . , - • - - ' 

. example, "the caart representation is used to econqiica^f "store well-fomerf 

O » * *. , O'i 

. •••••••• r 
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substructurts. * LooAcrJkearf, .1 east commitment, tnd differential diagnosis are 
example strategies used to refine PAZATN 1 s search for a parse. These allow for 
proceed log on the basis of reasonable assumptions when necessary, while retaining 
thAability to modify the interpretation in response to anomalies.* 

. / The analysis procedure hat been designed to dbta^in maxi^l advafftape from 
both top-<j}own guidance fi+n the task description and bottom-up protocol evidence. 
Analysis proceeds by a best first c&ro**tne search of a space of^artitil 

interpretations v The planchart, a data structure resembling an AND/OR goal tree. 

V y - " • 

is used Jto keep tratfk of expectations* By careful selection of the 



presentational scheme, . this structure ecbietts considerable ^storaae. *£oriMy ._ 

Partial knowledge of structure and of the status of expectations is recorded 

« * * * 

using ^a scheme of planchart markings and marker propagations. The planchart is 



incrementally expanded by PATN when existing executions are inadequate in view 
of the protocol data. A second chart, t^e datachart, is used to keep track of 
'the state of alternative partial" interpretations. 

Although PAZATN is not yet a ^rt^^prjgram, the design is sufficiently 
specific ^so as to be hai^-simulable. In hand-simulation, , there is a danger 
unintentionally drawing upon knowledge which has not b#en isolated or formalized. _ 
Care was exercised to' avoid this pitfaft, and the examples are encouraging 

* 

evidence thatT the approach is fundamentally sound. Still, ffand-simulatityi is not 
seen as a ^stjtute for implementation. 'Die next phase of the research is to 
implement and experiment with ^ prototype ^analyzer. * 

PAZATN is a generalization and Extension of previous apjfroacJws. PAZATN 
grew tut of Goldstein's [1974* 1975] plan-finder for "HY CROFT. The differences^ 
tre* that\PAZAT»^ (a) generates interpretations consistent piih £h£ Recently 
developed SPADE theory; f ^handles the wider range^f event types necessary to 
anml/fce protocols rather than finished program^; (c) takes advantage of tlfe 



PAZATN Is independent of the detailed form of the Synthetic fornallsa: 
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> 

dynamic information in. these additional event types regarding subgoal structure 

and AupMnt; end fd> is not limited -t* the Logo domain. The SPADE theory 

developed from the HY CROFT theory of program understanding as well 4s related 

work by 3us smart v[}97|],. Papert [1971] and Sacerdotl [1975]* [Goldstein * 

Jllller 1976b] argues that SPADE represents progress over this earlier theorizing. 

PAZATN also complements the features and limitations of Analyzers developed at 

Carntgie-Hellon University.. Th* major theoretical advance if a highly structured 

model of program synthesis. The major practical advance is th6 modularization of 

domain specific knowledge, which indicates that the PAZATN framework ought to be 

'l • * ^ • ^ 

applicable to a wide variety of task domains. 

> 

it does not intrinsically depend on PAID bping an augmented transition network. 
It is only necessary that the ^ynthetic component plan anfr debug by makljlg m 
serine of pragmatic choices which can be' summarized by the pjanchart data 
structure, ^and Jhat it be capable of "generating not one, but an entire space bf 
progressively less favored solution path?. Finally, an implicit assumption runs 
throughout the Analyzer's design that solutions ciyi be decomposed into syntactic, 
semantic, and pragmatic elements. % *It d8y .be thfct any* synthetic formalism 
satisfying these constraints i^ trivially equivalent to an ATN. Such questions 
are Notoriously difficult to settle. 

However, an important issue in the design is the breadth of the synthetic 
theory. There are pf course particular omissions suqh as conditional plans, 

^- • - • 

which have(1>e^ deliberately r- but only temporarily — . ignored. The greater 
tlfreat coaeY from the unknown. Even very young children display incredible 
_chnes$ in their problem solving. Although SPADE ' s origins are partly 
empirical ,< crucial, phenomena -perhaps those most in need *of investigation — 

■ay have been overlooked. This reaalns a topic for investigation.. 

. * * < 
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9.2. Iawloajii&atlon -Planf t - * ' 

There are several ways in which *n apparently sound design could fall In 
Implementation. Tha space of par v tinWnterpreUtions could turn out to be. very 
large relative to the sources of constraint which have been Isolated. The 
variety of knowledge used by human 'programmers could greatly surpass our current > 
estimates. PAZATN's storage requirements could exceed practical bounds.* The 
analyzer couia be' too rash in its heuristic quest for efficiency, terminating 
prematurely with unacceptable interpretations. . Too great a demand could be 
placed on PATN's ability to find reformulations encompassing bottom u^, evidence. 
Hand* simulations or even partial implementations could overlook such impediments . 

Consequently, complete implementation of a prototype system is essential 
for validating the research. We Intend, to perform this implementation 
incrementally, ' beginning with an Interactive version. At first, only 
straightforward bookkeeping functions will be automated. The coipuflr will 
record plans and event ass ignments' using the two charts, but decisions regarding 

which interpretations to pursue will be made by a human investigator. This will 
• * ' • . , • .. 

be 'replaced by a version which performs the routine analysis of most, events, only 

requesting help on more idifficult cases. Eventually, Ufa analysis will be 

handled completely tiu the machine. A nodeling conponent for induolhg 

personalized ATN's will be implamented, and its predictive power explored. To 

G • ' • • 

demonstrate PAZATN's understanding of the protocols, a question answering module 

' * . . • ,\. ' ■ ' € " 

• using a formal query language will be cpjjstructed to operate over PAZATN's. 

i 

OUtpUt. r 
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10. Notes to Book II 



\* [Brown et. A. 1977] includes a chapter indicating the direction of 
our efforts to' apply PAZATN to the symbolic integration domain. 

\ 

2. The one exception to this is that some irrational errors (such as 
■is typings) can be recognized as unsuccessful manifestations of PATH plans. 

3. This use of d aeons is inspired by Charniak's [1972] work on st&ry 
understanding. Demons are a type^of antecedent theorem [Hewitt 1972]. 

4. The chart data structure is due to Kay [1973] and Kaplan [1973]. 
,Ge»ereitzee\ AMD/OR grenJfc [Levi & Sirovjch 1976] are da^a structures, similar to 

otfr plancharts derived on the basis of independent fornal considerations*. We 
think that, because of the extension from trees t» charts *c well as the 
* incorporation of a larger variety of node types, our plantharts are an 
improvement over generalized AND/OR graphs along the dimensions of generality, 
storage economy * and expressive power. 

5. We Intend to provide PAZATN with limited heuristics for recognizing 
mnemonic identifiers. However, relying on user chosen names for guidance in 
general would be too. unreliable. Hence, to emphasize that -such guidance can be 
dispensed with, we assume here that procedure names are unrecognizable. 

/ - 

.6. In assigning a pcotocol event to a planchart leaf, the type of -event 
and the value of :HOD£U are considered, but the other semantic variables and the 
pragmatic assertions are generally not conjfaTred. This is a simplification 
which Ignores the possibility for coiplex semantic and pragmatic ambiguities. 
For example; two interpretations might be identical except for the value of 
:Ab*lCE at some node. Although this, difficulty seems unlikely, PAZATN could be 
elaborated slightly to handle it% Here we, ignore the prdblom and show only the 
structure description and the name of the submodel in our diagrams. (The other 
variables and the pragmatic assertions.^ being assumed unambiguous, are 
suppressed.) • * 

\ . ' ' . . ' * • 

7. Interlisp [Teitelman 1974, pp. 17.10-17.14] provides such a spelling 

corrector. See also/ [Teitelman 1970]. t 

K 

0> 8. Such techniques are io common use. ♦ See, for example, [G'reenblatt ot 
al^l967, pp. 806-807], * 

9. For example, if mueh mora than theSypical time elapses between the 
type- in of two consecutive events, it is more plausible to interpret the second 
event as initiating a new episode. A morq specific example involves the /rmquent 
errors associated with Logo line numbers. Ther«vare two such errors: (i)^ 
failing to include a line number when it is needed; and (2) accidentally 
including a line number when it is not needed. Consider *he second. If much 
more than the typical time elapses between the type-in of the line number and the 
"type-in of the remainder-of the event, it becomes more plausible to interpretsthe 
event as « baggy- RUNe vent rather than a legal but inexplicable EDIT event. 
Rather than storing every value, 'however, the preprocessor will accumulate 
summary statistics, only recording the specific .data for type-ins mhicn are 
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Markedly slower than the average . 



10. Information concerning other Logo event specialists as wall as 
additional parsed protocols will be supplied by the authors upon'request. 
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